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The Lepidoptera of North America Network, or LepNet, is a digitization effort recently launched to mobilize biodiversity 
data from 3 million specimens of butterflies and moths in United States natural history collections (http://www.lep- 
net.org/). LepNet was initially conceived as a North American effort but the project seeks collaborations with museums 
and other organizations worldwide. The overall goal is to transform Lepidoptera specimen data into readily available 
digital formats to foster global research in taxonomy, ecology and evolutionary biology. 
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Why Lepidoptera of North America? Lepidoptera are disproportionately common in both large and small research 
collections across the continent, with an estimated total of 17 million specimens, and their popularity has made them 
among the best known and most collected of all insects worldwide (Kawahara & Pyle 2012). Although their abundance 
in natural history collections is high, the amount of digitized specimen data for Lepidoptera is proportionately low 
(particularly in comparison to plants and vertebrates). LepNet aims to rectify this deficiency, as butterflies and moths are 
an obvious group to support data-driven research. 

Insect herbivores and their host plants dominate terrestrial biomes, and as both herbivores and pollinators, Lepidoptera 
are one of the most important insect orders linked to the radiation of flowering plants (e.g., Menken et al. 2009). They are also 
among the most diverse and publicly recognizable insects, with more than 157,000 described species globally (e.g., van 
Nieukerken 2011) and over 14,000 species in 86 families in North America alone. Lepidoptera are among the most impactful 
groups of insect pests to agriculture, and are the primary source of food for many vertebrates as well as hosts of parasitoid 
wasps and flies (e.g., Wagner 2001). Lepidoptera have served as model systems for studies of genetics, physiology, 
development, and many aspects of ecology and evolutionary biology including insect/plant coevolution, conservation biology 
and biogeography (e.g., Monteiro & Pierce 2001, Als el al. 2004, Braby et al. 2006, Canfield el al. 2008, Roe el al. 2009, Vila 
et al. 2011, Talavera et al. 2013). Since the pioneering work of Ehrlich and Raven (1964) on the coevolution of butterflies and 
their hosts, there has been great interest in trying to detect and understand macroevolutionaiy patterns in insect-plant 
associations (review by Janz 2011). Recent studies are using morphological and/or molecular data from collections to 
construct phylogenies of Lepidoptera and determine the evolution of host associations, as well as examine continental scale 
data to understand broad patterns in herbivore community dynamics (e.g., Hamback et al. 2007, Ockinger et al. 2010). 

We have the opportunity to enable data synthesis research with the largest clade of herbivores, and the data are ready 
and waiting on specimens and their labels in natural history collections. At present, fewer than 10 percent of the North 
American species have sufficiently accessible digital occurrence data to enable such research. Moreover, public policy 
and expenditures are currently being driven by incomplete and/or geographically constrained studies. Thus, integration 
of existing but currently unconnected digitization efforts in North America is urgently needed to realize the outstanding 
potential of Lepidoptera and translate the data into transformative research and outreach (Cobb et al. 2016). 

Under the lead of Northern Arizona University, LepNet currently comprises 27 collaborating collections across 26 
US states (Figure 1) with collective holdings of approximately 6 million Lepidoptera specimens. Funding for LepNet is 
being provided from 2016 to 2020 by the National Science Foundation (NSF) through its Advancing Digitization of 
Biodiversity Collections (ADBC) program. The ADBC funds two main initiatives: a national hub that coordinates US 
collections digitization efforts, and specific collaborative digitization projects called Thematic Collections Networks 
(TCNs). LepNet is one of 18 currently active TCNs, which range widely in research focus. Integrated Digitized 
Biocollections (iDigBio) is the national ADBC hub and is based at the University of Florida. iDigBio supports national 
digitization by fostering partnerships and integrations among TCN participants, establishing best practices and 
workflows, disseminating the data and products generated, and promoting use of biodiversity data by the scientific 
community and other stakeholders. Among the TCNs, LepNet is well positioned to engage a broadly representative 
community of researchers and to serve as a model for additional large arthropod-based digitization projects. Although the 
ADBC program has achieved considerable and rapid progress, the challenge with Lepidoptera reflects the obstacles 
involved in digitizing the 287 million arthropod specimens estimated to reside in North American collections. Of the 
74.7 million existing iDigBio records, only 15 percent are arthropods, and Lepidoptera currently comprise 838,000 
records of those, which is just 1 percent of the overall iDigBio total (data as of 15 January 2017). 

LepNet has four interrelated goals for digitization. The first goal is databasing and mobilizing label information 
from over 1.7 million specimens in the 27 partnering collections, and integrate these with more than 1 million existing 
records. Databasing in LepNet consists of digitally capturing the information from specimen labels and georeferencing 
as many as possible of the geographic localities represented, using a combination of automated and manual pipelines. 
The second goal is databasing 35,000 larval samples with associated host plant data, which will mark the first significant 
digitization of Lepidoptera immatures from North American collections. The third goal is generating 214,000 images of 
Lepidoptera specimens. Approximately 80,000 will be high-resolution (digital single-lens reflex) camera images, at a 
species-exemplar level, that will document over half of the Lepidoptera species diversity in North America. The 
remainder will be smartphone-quality images for selected lepidopteran groups, which will be represented by larger 
numbers of images per species; these will include butterflies, distinctive moths, and species exhibiting geographic 
variation. Collectively these digitization targets are aimed at elevating a projected 5,000 North American Lepidoptera 
species to a "research ready" status suitable for complex, data-driven analyses. The fourth goal is making all images and 
data available online and available as downloadable datasets through the LepNet, iDigBio, and GB1F data portals. All 
three portals will frequently be updated with new data from collections to keep the public datasets current. 
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FIGURE 1 . Geographic distribution of the 27 partner institutions in LepNet and their digitization targets for specimen label data 
(blue) and specimen images (red). 

To achieve the LepNet goals, the 27 partnering institutions are using a variety of existing successful databasing and 
imaging solutions in their respective collections, while adhering to a set of common standards and principles developed 
through iDigBio and prior TCN experience. In addition, LepNet is using an integrated Symbiota software portal (Gries et 
al. 2014; http://symbiota4.acis.utl.edu/scan/lepnet/portal/index.php) that allows direct input of specimen data and images 
by those partner institutions lacking suitable digital infrastructure. LepNet data from all 27 partners are aggregated in the 
Symbiota portal, and this portal in turn provisions iDigBio and GB1F. 

LepNet is also collaborating with Visipedia (http://www.visipedia.org) and Fieldguide (https://fieldguide.net) in 
developing LepSnap, a computer vision project that includes a smartphone app and an identification widget incorporated 
into websites (van Horn et al. 2015). LepSnap will facilitate taxonomic identifications of images and provide tools for 
improved image searching. LepSnap is an insect-based extension of the Visipedia collaboration with the Cornell 
Laboratory of Ornithology (Merlin). In a pilot project with 62 species of tiger moths (Arctiinae) from the Pacific 
Northwest, over 75 percent of the tested images were correctly identified to the respective species with 80-100 percent 
probability based solely on a photo. Accuracy is expected to improve as LepNet produces more training images per 
species and data driven ecological filters are assembled (e.g., phenology, geography). The capacity to identify specimens 
quickly and efficiently by simply taking a photo or dragging an image file onto the website widget and submitting it to 
LepSnap for identification, will greatly aid the LepNet contributor and user communities in providing determinations 
and improving data usability. 

We estimate that there are approximately 6 million fully curated and ready-to-database Lepidoptera specimens 
housed among the 27 LepNet partner collections, about 4 million of which are from North America. With the seed 
funding provided by the NSF-ADBC program we expect most of these specimens to be digitized in the next 10 years. 
LepNet is therefore proposing to database a significant amount of the estimated 17 million specimens held in North 
American collections. The ADBC effort will cover 90 percent of the Lepidoptera specimens in the 22 smaller partner 
collections, and up to 50 percent of the five largest partner collections. We expect to sponsor at least four additional 
proposals from museums that want to become funded LepNet partners through the NSF ADBC "Partners to Existing 
Networks" (PEN) program. The PEN projects should yield an additional 500,000 records by 2020. LepNet will be 
digitizing Lepidoptera in proportion to the number of specimens housed in the partner collections. In order to obtain 
sufficient data across all 86 Lepidoptera families, our approach is skewed towards moths, which comprise 94 percent of 
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North American lepidopteran species-level diversity—butterflies comprise just 6 percent of our fauna, but represent 40 
percent of the existing records in iDigBio. Despite this imbalance, we will digitize butterflies in somewhat greater 
proportion than their diversity, in part because the life histories of butterflies are better known than any other group of 
arthropods and they are exceedingly popular in outreach, citizen science and the public. Overall, we are targeting 
approximately 1.2 million records for moths and 500,000 records for butterflies (Figure 2). 

The expected impacts of LepNet beyond enabling biodiversity research will be substantial. The charisma of 
butterflies and moths profoundly inspires children and adults, and promotes public understanding of their relevance like 
no other arthropod group. The visibility and beauty of many moths and butterflies have captivated amateur collectors and 
professional entomologists for centuries, thus providing a unique foundation for influencing education, public 
awareness, and conservation (Kawahara & Pyle 2012). 

Collectively, the LepNet partner institutions operate 67 existing outreach and education programs, including summer 
camps, annual events, and workshops that reach a diverse nationwide audience of 2.5 million people per year. Many of these 
programs target non-traditional and underserved audiences beyond the university setting. During the grant period, LepNet's 
educational program “Explore more with LepXplor” will connect marginalized learners in formal/informal settings to 
Lepidoptera data, collections and related key concepts (e.g.. host plant associations). Participants will learn about 
Lepidoptera diversity via an augmented reality (AR) application that overlays digital information about specimen 
collections via tangible AR cards. This specimen-based learning tool aims to promote both STEM literacy and English 
acquisition for English language learners in adult literacy programs and K-12 classrooms. English language learners often 
struggle with academic language, and the integration of immersive technologies can directly impact academic achievement 
and motivation in learning (Solak & Cakir 2015). It is anticipated that more than 400 students and 3,500 volunteers will be 
involved directly with LepXplor, with public engagement channeled through the respective institutional outreach programs 
(e.g., collaborating with educators to integrate curriculum with United States federal and state-level education standards). 
These efforts will simultaneously emphasize the importance of scientific natural history collections in supporting collection- 
based learning experiences, and foster understanding of environmental impacts, climate change and biodiversity. 
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FIGURE 2. Existing (black) and expected (blue, yellow, and green) number of records to be produced by LepNet for the 3 major 
groups and 15 families of Lepidoptera that are most common in the collections of the partner institutions. Existing records were 
obtained front the SCAN and iDigBio data portals. 
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